Intersect (-c) mRNA, mCpG
Intersect (-c) mRNA, NONmCpG
Intersect (-c) CDS, mCpG
Intersect (-c) CDS, NONCpG
Join [SQLshare] (subtract to get intron)
example
SELECT * FROM [sr320@washington.edu].[fish546_module1_blast_table]
INNER JOIN [dhalperi@washington.edu].[gp_association.goa_uniprot]
ON [sr320@washington.edu].[fish546_module1_blast_table].SPID=[dhalperi@washington.edu].[gp_association.goa_uniprot ].Column2
Various code
Select Count (mCpGcount) FROM [sr320@washington.edu].[fish546TJGR_CDS_int_mCpG_2]
#counts - 196691
Select sum (mCpGcount) FROM [sr320@washington.edu].[fish546TJGR_CDS_int_mCpG_2]
#sum 246609
Select ID, SUM(mCpGcount) FROM [sr320@washington.edu].[fish546TJGR_CDS_int_mCpG_2]
Group by ID
#AWESOME
Select ID, avg(mCpGcount),min(mCpGcount),max(mCpGcount),sum(mCpGcount),count(mCpGcount) FROM [sr320@washington.edu].[fish546TJGR_CDS_int_mCpG_2]
Group by ID
Select * From [sr320@washington.edu].[fish546TJGR_mRNA_int_mCpG_2]
Inner join [sr320@washington.edu].[Stats_CDS_int_mCpG]
ON [sr320@washington.edu].[fish546TJGR_mRNA_int_mCpG_2].ID=[sr320@washington.edu].[Stats_CDS_int_mCpG].ID
#Downloaded as Methylated CpG dataset
Select * From [sr320@washington.edu].[fish546TJGR_mRNA_int_NOmCpG_2] Inner join [sr320@washington.edu].[Stats_CDS_int_NOmCpG]
ON [sr320@washington.edu].[fish546TJGR_mRNA_int_NOmCpG_2].ID=[sr320@washington.edu].[Stats_CDS_int_NOmCpG].ID
#Downloaded as NO methylated dataset
Select ID, avg(NOmCpGcount),min(NOmCpGcount),max(NOmCpGcount),sum(NOmCpGcount),count(NOmCpGcount) FROM [sr320@washington.edu].[fish546TJGR_CDS_int_NOmCpG_2]
Group by ID
Stats_CDS_int_NOmCpG
INTO excel (oh no!)
4042 genes have no Bisulfite Data.
Got intron data- now back in SQLshare
some codes
SELECT * FROM [sr320@washington.edu].[AggCo Oyster Bisulfite mRNA and CDS]
Where "SUM mRNA" ]]>
100
and "Ratio mCDS/mIntron" ]]>
3
Lets get some gene names
SELECT * FROM [sr320@washington.edu].[BSoysterGENE]
Where "SUM mRNA" ]]>
100
and "Ratio mCDS/mIntron" ]]>
3
SELECT * FROM [sr320@washington.edu].[BSoysterGENE]
Where "SUM mRNA" ]]
]]>
100
and "Percent mCpG (CDS)" ]]
]]>
75
and "Percent mCpG (Intron)" < 25
Joining with expression data
Histogram
#read in table
data<-read.csv("/Users/sr320/Desktop/TJGRR/AggCo.csv")
#view
head(data)
library(ggplot2)
qplot(data$Percent.mCpG..CDS., binwidth=1)
qplot(data$Ratio.mCDS.mIntron, binwidth=.1)
ggplot(data, aes(x=mCpGcount)) + geom_histogram(binwidth=.5) + scale_x_continuous(limits = c(0, 300))
ggplot(data, aes(x=mCpGcount)) + geom_histogram(binwidth=.5)
ggplot(data, aes(x=CDScount, y=Ratio.mCDS.mIntron)) + geom_point(shape=1)
Intersect (-c) Expression based on mRNA (need to create)
On iPlant have BAM (accepted hits) one Mgo on genome.
Need to get (split) coverage across mRNA
or
Should be able to just get RPKM
--------------------
Take clusters and ID using Blast
----------------
Clusters closest
done
--------------------
Study those mCpG that are same between Gill and Sperm